A 'Silent Trial' Assessing the Accuracy of Large Language Models for Assisting Community Health Workers in Low-Resource Settings
This study evaluating Large Language Models in Rwanda found that while OpenAI's o3 matched the high referral accuracy of local community health workers, Google's Gemini Flash 2.5 performed poorly, suggesting that model selection is critical and that LLMs may offer limited immediate benefit in well-established programs but could support less mature initiatives.
Shimelash, N., Rutunda, S., Menon, V., Emmanual-Fabula, M., Uwimbabazi, A., Rugege, C., Nshimiyimana, C., Rwema, I., Kandekwe, M., Berhe, D. F. D., Wong, R., Remera, E., Hezagira, E., Gill, J., Archer (…)2026-02-17📄 primary care research